PreviousNextTracker indexSee it online !

(266/308) 1830 - XML - html5 DOCTYPE

Every HTML5 document contains the following dtd-less DOCTYPE declaration (see attachment):

<!DOCTYPE html>

This makes the XML parser spit out a lot of errors about undeclared elements. The XML spec says that a document with a dtd-less DOCTYPE declaration can still be well-formed. Since this is a very common case, it should be handled more gracefully. If there is no (internal or external) DTD, it would be more useful to stop validating and only report errors in well-formedness, so these do not get lost.

See also http://www.w3.org/TR/html51/syntax.html#the-doctype
and http://www.w3.org/TR/REC-xml/#NT-doctypedecl

Submitted will69 - 2015-12-22 09:03:20.526000 Assigned kerik-sf
Priority 5 Labels
Status pending Group
Resolution fixed

Comments

2015-12-22 09:11:15.312000
will69


~~~~
<!DOCTYPE html>
~~~~

2016-01-31 15:21:18.153000
kerik-sf

Long story short: it doesn't seem possible to disable dtd validation on the fly without seriously hacking Xerces-J. So I have to interrupt parsing and reparse when detecting the empty html doctype.

Also the next thing you'll want is the charset:
~~~~
<meta charset="UTF-8">
~~~~
and that's not well-formed xml.

So why not stick with the html sidekick?

2016-02-09 11:43:20.006000
will69

Hi Eric. Thanks for following up on this! There is an HTML and an XHTML syntax for HTML5. The XHTML variant is, of course, an application of xml. UTF-8/UTF-16 is the default encoding for any xml application. [See here](https://wiki.whatwg.org/wiki/HTML_vs._XHTML) for a comparison of HTML5 and XHTML5. So is this actually a problem with Xerces and should be filed upstream? What about reading the first 15 characters and switching validation off, before invoking the parser? jEdit reads the first line of a file to determine the file type anyway, doesn't it?

We are using jEdit in an educational context and XHTML/XHTML5 will probably always be the most common application of xml.

2016-02-09 21:45:08.404000
kerik-sf

Hi will69,
I better understand why you would want the XML sidekick then (for XHTML5).
Try this build: http://www.elelay.fr/XML-788d001.zip
It is https://sourceforge.net/p/jedit/svn/24305/tree/plugins/XML/trunk/
To use it, drop it in the %JEDIT_SETTINGS%/jars folder (you can find which folder it is via the Utilities > Settings Directory menu).
It does not report errors on https://sourceforge.net/p/jedit/svn/24305/tree/plugins/XML/trunk/test_data/dtd/html5.xml

2016-02-10 11:09:04.618000
will69

This works great! It even starts validating again as soon as I use an internal subset.
This is really helpful! Thanks a lot, Eric!

2016-02-10 18:03:51.768000
kerik-sf

Good to know that it works for you.

If you want to use the internal subset for entities but not validation, you can tell jEdit to turn off DTD validation by inserting
<!-- :xml.validate.ignore-dtd=true: -->
on top of the document.

2016-02-12 21:52:34.101000
kerik-sf

- **status**: open --> pending-fixed
- **Group**: -->

2016-02-12 21:52:34.563000
kerik-sf

will be in next release